Method and System for an Estimator Using Estimated Mean and Covariance

ABSTRACT

An estimator system where unknown values are represented as a plurality of vectors in multi-dimensional space is disclosed. The statistics of the vectors constitute a mean vector and a covariance matrix. A mean vector and a covariance matrix are estimated from a database of known values. Estimated values can then be predicted using the mean vector and the covariance matrix.

CROSS REFERENCE TO RELATED APPLICATION

This application is a continuation of and claims priority to U.S. patent application Ser. No. 12/705,932 filed on Feb. 15, 2010, entitled: “METHOD AND APPARATUS FOR A RECOMMENDER SYSTEM USING ESTIMATED MEAN AND COVARIANCE”.

BACKGROUND OF THE INVENTION

1. Field

The application relates to computer-based methods and apparatus for recommender systems.

2. Prior Art

A recommender system can be used to predict preferences of users for products. Consider a recommender system involving a plurality of products rated by a plurality of users. An observed rating is a rating of one of the products that has been made by one of the users. If a particular user has not yet rated a particular product then that rating is a missing rating. The product recommendation problem is to predict missing ratings.

For example, a user named Tony has rated three movies: Titanic as 4 stars, Star Wars as 2 stars, and The Godfather as 3 stars. Tony has not rated Independence Day and Jaws. A user named Mike has rated Titanic as 1 star and Jaws as 3 stars. Mike has not rated Star Wars, The Godfather or Independence Day. The set of Tony's observed ratings thus consists of his ratings of Titanic, Star Wars, and The Godfather. Mike's observed ratings thus consist of his ratings for Titanic and Jaws. The goal of movie recommendation to accurately predict the missing ratings which constitute the ratings Tony would give to Independence Day and Jaws and the ratings that Mike would give to Star Wars, The Godfather and Independence Day.

Sets of product ratings can be represented as a sparse data-matrix. Let k denote the number of products and let n denote the number of users. In the movie example above k= and n=2. Let y_(i)(j) denote the observed rating of the jth product by the ith user. Let y_(i) denote all observed ratings from the ith user The data-matrix is the set of all observed ratings from all users and can be represented as 101 in FIG. 1. Positions in the data-matrix corresponding to missing ratings have no assigned value. For the movie example given above the data-matrix is 201 in FIG. 2. Performance of recommender systems is generally measured using a Root Mean Squared Error (RMSE). The RMSE measures a difference between a predicted rating and an observed rating, where the observed rating was not used in calculation of the predicted rating.

In addition to movies there are many similar products whose ratings are able to be predicted by recommender systems, e.g. book and television shows. Recommender systems are applicable in Internet dating sites to predict how users will rate potential mates. Grocery stores can use recommender systems to predict the buyer habits of shoppers. Recommender systems are applicable in medicine. Consider a plurality of drugs and a plurality of patients. Consider a patient who has responded well to some drugs but not others. The drug response (positive or negative) is akin to a rating. Recommender systems can thus be applied to predict other drugs the patient will respond well to.

Recommender systems are currently an active area of research. An approach called matrix factorization currently stands out for large-scale, real-world, recommender systems. In this process an integer l is first chosen. Then the data-matrix of size k×n is approximated by a product of a left matrix 301 of size k×l and a right matrix 302 of size l×n. The left matrix 301 and right matrix 302 are generally estimated to minimize an error measure between their product and the data matrix 101. Once these matrices have been estimated, missing ratings of a particular given user are predicted by an inner product of appropriate columns and rows from the left and right matrices. For example, y_(i)(j) is predicted by the dot product of the ith row of the left matrix 301 and the jth column of the right matrix 302.

The matrix factorization approach has a number of disadvantages. No single technique stands out for estimation of the left matrix 301 and the right matrix 302. A variety of approach have been proposed that all result in slightly different values for the left matrix 301 and the right matrix 302. All of these approaches have different performance. There is also little guidance to help with selection of the integer l. Generally a variety of values of l need to be tried. In many cases a large l performs best. A large l, however, means that the left matrix 301 and the right matrix 302 are also very large and thus the approach requires estimation of many parameters which is generally computationally expensive. It is generally difficult to add new products and new movies to an existing matrix factorization system. Generally addition of new users and/or new products requires estimation of new versions of the left matrix 301 and the and right matrix 302. Finally the performance of basic matrix factorization is generally too low for most applications. A number of ways have been proposed to increase the performance. It has been found that the performance can be increased by averaging predictions of a number of matrix factorization systems that use slightly different training approaches and/or different values of l. This approach may be too cumbersome for practical application. The performance of matrix factorization is also sometimes increased by applying a combination of pre-processing, post-processing and data manipulation steps. Such steps are generally heuristic in nature and may only work on particular data-sets. For example, subtraction of a global mean may improve performance for movie recommendation but not for a dating site.

A need exists for a method and apparatus that achieves high performance without the disadvantages of matrix factorization. Specifically the method and apparatus should be able to achieve high performance without requiring averaging of predictions from multiple systems. The method and apparatus should be able to add new products and new users without requiring extensive re-calculations. The method and apparatus should not rely on heuristic data manipulations to achieve high performance.

SUMMARY

The disadvantages of the art are alleviated to a great extent by a method and apparatus for product recommendation comprising

-   -   an estimation step whereby a mean vector and a covariance matrix         are estimated from the data-matrix 101;     -   a prediction step whereby a missing product rating is predicted         using the mean vector and the covariance matrix.

The estimation step whereby the mean vector and the covariance matrix are estimated from the observed data is a key issue. Let μ denoted the k×1 mean vector and let R denote the k×k covariance matrix. If all ratings are observed and none are missing the mean and covariance can be estimated using standard techniques. For example, without missing ratings the mean vector can be estimated by an arithmetic mean, μ=(1/n)Σ_(i=1) ^(n)y_(t). Without missing ratings the covariance matrix can be estimate using a sample covariance R=(1/n)Σ_(i=1) ^(n)y_(t)y′_(t) where y′_(t) represents the transpose of y_(t).

Missing ratings complicate the estimation step. The arithmetic mean and the sample covariance are no longer appropriate when some ratings are missing. Equations used here have been developed theoretically to estimate the mean vector and the covariance matrix when some ratings are missing. These equations have not been used previously in any application.

Once the mean vector and covariance matrix have been estimated then the prediction step predicts missing ratings. In one embodiment the prediction step uses a Minimum Mean Squared Error (MMSE) predictor. The MMSE predictor minimizes a theoretical squared difference between predicted ratings and their true, but unknown values.

The embodiments disclosed here achieve high performance without requiring averaging of predictions from multiple systems. Furthermore it is straightforward to add new products and new users. The embodiments disclosed here do not rely on heuristic data manipulations to achieve high performance.

BRIEF DESCRIPTION OF THE DRAWINGS

The method and apparatus will now be described in more detail, by way of example only, with reference to the accompanying drawings, in which:

FIG. 1 shows the data-matrix consisting of all observed ratings from all users;

FIG. 2 shows an example of a data-matrix;

FIG. 3 shows the prior-art matrix factorization approach;

FIG. 4 shows a recommender system and its connection via the Internet to users;

FIG. 5 is a flow chart of one embodiment of the estimation step;

FIG. 6 is a flow chart of one embodiment of the prediction step.

DETAILED DESCRIPTION

FIG. 4 shows a recommender system and its connection via Internet 103 to users 101 and 102. A Central Processing Unit (CPU) 105 retrieves and stores data to and from a Hard Disk Drive (HDD) 104 and Random Access Memory (RAM) 106. For example, the data-matrix 101 is stored on the HDD 104 but is copied to RAM 106 during the estimation step and the prediction step. After the estimation step is complete the mean and covariance are copied from RAM 106 to the HDD 104. The prediction step occurs when a user, e.g. user 101, requests prediction of his or her missing ratings. This would occur, for example, when the user 101 request a movie recommendation. In the prediction step the mean and covariance are copied from the HDD 104 to RAM 106. In RAM 106 the mean and covariance are used to prediction the requested missing ratings.

FIG. 5 is a flow chart representing the operation of the estimation step. The sub-steps within the estimation step are now specified.

Initialize Mean Vector and Covariance Matrix, Sub-Step 201

A recommender system starts without a mean vector nor covariance matrix and these quantities must be initialized using the data-matrix 101. In one embodiment the initial value of the mean vector is provided by an arithmetic mean of the observed ratings. In one embodiment the initial value of the covariance matrix is provided by a novel matrix that has non-zero off-diagonal elements and diagonal elements equal to sample variances.

Mathematical formula for initialization of the mean vector and the covariance matrix are now presented. As before, let μ denote the mean vector and, let R denote the covariance matrix, and let y_(t) be observed ratings from a tth user, where t is an integer from 1 to n. Let k_(t) be the number of ratings provided by the tth user, where 0≦k_(t)≦k. Let I be a k×k identity matrix. Let H_(y) _(t) be a k_(t)×k matrix given by I with rows corresponding to indices of missing ratings from the tth user deleted. Let N be a k×k diagonal matrix given by N=Σ_(t=1) ^(n)H′_(y) _(t) H_(y) _(t) , where ′ denotes vector or matrix transpose as appropriate. H_(y) _(t) . Elements along the diagonal of N thus equal numbers of times each product was rated. The mean vector is initialized by μ=N⁻¹Σ_(t=1) ^(n)H′_(y) _(t) y_(t). Define a matrix S=Σ_(t=1) ^(n)H′_(y) _(t) (y_(t)−H_(y) _(t) μ)(y_(t)−H_(y) _(t) μ)′H_(y) _(t) constituting an un-normalized sample covariance matrix. The covariance matrix is initialized by R=N^(−1/2)SN^(−1/2).

Thus the input to sub-step 201 is the data-matrix 101. The output of the sub-set 201 are initial estimates of the mean vector and covariance matrix.

Update Mean Vector and Covariance Matrix, Sub-Step 202

The arithmetic mean of the observed ratings is a good initializer for the mean vector. Performance can be improved by updating an existing mean vector using Maximum Likelihood (ML) theory. With missing ratings, the ML estimate was obtained by McMichael and is a closed form expression given by

$\begin{matrix} {{\mu \left( {\sum\limits_{t = 1}^{n}\; {H_{y_{t}}^{\prime}R_{y_{t}}^{- 1}H_{y_{t}}}} \right)}^{- 1}{\sum\limits_{t = 1}^{n}\; {H_{y_{t}}^{\prime}R_{y_{t}}^{- 1}y_{t}}}} & (1) \end{matrix}$

where R_(y) _(t) =H_(y) _(t) RH′_(y) _(t) . Note that this estimate requires the covariance matrix, thus we cannot use it in step 201. This theoretical estimate was first derived by McMichael and has not been applied previously to recommendation systems or otherwise.

No such closed-form ML estimate of the covariance matrix is known. Thus, existing values of the covariance matrix are updated using a modified gradient descent algorithm given by

$\begin{matrix} {R = {R + {\gamma \; {R\left( {\sum\limits_{t = 1}^{n}\; {{H_{y_{t}}^{\prime}\left( {R_{y_{t}}^{- 1} - {{R_{y_{t}}^{- 1}\left( {y_{t} - \mu_{y_{t}}} \right)}\left( {y_{t} - \mu_{y_{t}}} \right)^{\prime}R_{y_{t}}^{- 1}}} \right)}H_{y_{t}}}} \right)}R}}} & (2) \end{matrix}$

where μ_(y) _(t) =H_(y) _(t) μ and γ>0 is a predetermined constant. This theoretical technique was also developed by McMichael and again has not been applied previously to recommendation systems or otherwise.

Thus the input to sub-step 202 of the estimate step is the data-matrix 101 and current estimates of the mean vector and the covariance matrix. The output of the sub-set 202 are updated estimates of the mean vector and covariance matrix.

Check for Convergence, Sub-Step 203

The mean vector and the covariance matrix are continuously updated in the estimation step until a convergence criterion is satisfied. A likelihood is given by

$\begin{matrix} {{p\left( {{y^{n};\mu},R} \right)} = {\prod\limits_{t = 1}^{n}\; \frac{\exp - {\left( {y_{t} - \mu_{y_{t}}} \right)^{\prime}{{R_{y_{t}}^{- 1}\left( {y_{t} - \mu_{y_{t}}} \right)}/2}}}{\left( {2\; \pi} \right)^{k_{t}/2}{R_{y_{t}}}^{1\text{/}2}}}} & (3) \end{matrix}$

where y^(n)={y₁, . . . , y_(n)} represents all observed product ratings in the data-matrix 101. The convergence criterion is satisfied once changes in a likelihood calculated using successive estimates of the mean and covariance are sufficiently small. In this case a Boolean flag is set to indicate that convergence has occurred and sub-step 204 is executed. Otherwise the Boolean flag is not set and sub-step 202 is re-executed.

Thus the input to sub-step 203 is the data-matrix 101, updated estimates of the mean vector and the covariance matrix, and a likelihood of previous estimates of the mean and the covariance. The output of the sub-step 203 is the Boolean flag which represents whether convergence has occurred or not.

Store Mean and Covariance Matrix, Sub-Step 204

In the estimation procedure the mean and covariance are generally processing in RAM 106. Once the estimation procedure has converged the mean and covariance are stored on the HDD 104.

FIG. 6 is a flow chart of the operation of one embodiment of the prediction step. In the prediction step the mean and covariance are used to predict missing ratings. The sub-steps within the prediction step are now specified.

Retrieve Mean and Covariance Matrix, Sub-Step 301

In sub-step 301 the mean and covariance are copied from the HDD 104 into RAM 106.

Thus the output of sub-step 301 are copies of the mean and covariance in RAM 106.

Retrieve Observed Product Ratings from the Given User, Sub-Step 302

Once a particular user, e.g. 101, generates a request for prediction of missing ratings, the particular user's observed ratings are retrieved from the HDD 104 and copied into RAM 106.

Thus the output of sub-step 302 of the prediction step are copies of a users observed ratings in RAM 106.

Predict Required Missing Product Ratings for the Given User, Sub-Step 303

Sub-step 303 provides a final answer of the disclosed method and apparatus. This final answer constitutes a prediction of a plurality of required missing product ratings for a given user. Let z denote a plurality of observed ratings from the given user. Let {circumflex over (x)} be the prediction of the missing product ratings of the given user. In one embodiment this prediction is obtained by

{circumflex over (x)}=R _(xz) R _(z) ⁻¹(z−μ _(z))+μ_(x)  (4)

where R_(xz), R_(z), μ_(x), and μ_(z) are appropriate sub-matrices and sub-vectors from, respectively, the mean vector and the covariance matrix. We now specify these. Let H_(z) be a matrix given by I with rows corresponding to indices of missing ratings from z deleted. Let H_(x) be a matrix given by I with rows corresponding to indices of missing ratings from {circumflex over (x)} deleted. Then R_(xz)=H_(x)RH′_(z), R_(z)=H_(z)RH′_(z), μ_(x)=H_(x)μ, and μ_(z)=H_(z)μ.

Thus the input to sub-step 303 are the mean vector and the covariance matrix and the observed ratings of a given user. The output of the sub-step 303 is the prediction of the particular user's missing data.

Implementation

In one embodiment implementation can be performed in a programming language such a Matlab or C or any other programming language. The programming language can be augmented by using Basic Linear Algebra Subprograms (BLAS). For efficient memory usage BLAS routines can operate on a designated memory location in random access memory (RAM) 106. The designated memory can be large enough to hold a k×k matrix. The designated k×k memory location can hold a k_(t)×k_(t) matrix when processing observations from a tth user.

To illustrate the sequence of BLAS routines consider, for example, an iteration of sub-step 202. For observed ratings from the tth user, a matrix R_(y) _(t) =H_(y) _(t) RH′_(y) _(t) , can be formed by copying relevant elements of the covariance matrix into the designated memory location. In this and other similar operations matrix multiplications are not required. A matrix R_(y) _(t) can be overwritten by its upper triangular Cholesky decomposition U_(y) _(t) where R_(y) _(t) =U_(y) _(t) U′_(y) _(t) . Cholesky decomposition can be performed using a BLAS routine called “spotrf” optimized for positive semi-definite matrices. A BLAS routine called “strsm” can be used to calculate U_(y) _(t) ⁻¹(y_(t)−μ_(y) _(t) ), followed by a another call to “strsm” to calculate R_(y) _(t) ⁻¹(y_(t)−μ_(y) _(t) ). In a similar fashion two calls to “strsm” can be used to calculate R_(y) _(t) ⁻¹y_(t). A matrix R_(y) _(t) ⁻¹(y_(t)−μ_(y) _(t) )(y_(t)−μ_(y) _(t) )′R_(y) _(t) ⁻¹, can be calculated using a BLAS routine called “ssyrk”. A matrix R_(y) _(t) ⁻¹ can be calculated using a BLAS called “spotri”.

The BLAS function calls can also be used to calculate the quantities in sub-step 203. A scalar (y_(t)μ_(y) _(t) )R_(y) _(t) ⁻¹(y_(t)μ_(y) _(t) ), required can be calculated by squaring and summing the elements of U_(y) _(t) ⁻¹(y_(t)−μ_(y) _(t) ). A required determinant can be calculated using the identity log|R_(y) _(t) |=2 Σ_(j) log((U_(y) _(t) )_(jj)) where (U_(y) _(t) )_(jj) is the jjth element of U_(y) _(t) .

Alternative Embodiments Initialize Mean Vector and Covariance Matrix, Sub-Step 201

In yet other embodiments different initializations for the covariance matrix can be used. For example, a simple such initialization is R=I where I is a k×k identity matrix. Another alternative is a diagonal matrix whose elements correspond to sample variances of the observed ratings. The covariance matrix is thus R=N⁻¹diag(S) where diag(S) is a diagonal matrix consisting of a diagonal elements of S. Another alternative is R=diag(S)^(−1/2)S diag(S)^(−1/2). Note that off-diagonal elements of this matrix are non-zero.

Update Mean Vector and Covariance Matrix, Sub-Step 202

Additional embodiments in the form of alternative methods to update the covariance matrix are possible. For example, a method based on an Expectation Maximization (EM) algorithm is possible. Iterations of the EM algorithm are guaranteed to not decrease the likelihood. Let X_(t) denote a (k−k_(t))-dimensional random vector representing all missing ratings of the tth user. Let H_(x) _(t) be a (k−k_(t))×k matrix given by I with rows corresponding to the indices of the observed ratings deleted. The conditional mean prediction of all missing ratings is given by

{circumflex over (X)} _(t) =R _(x) _(t) _(y) _(t) R _(yt) ⁻¹(y _(t)−μ_(y) _(t) )+μ_(x) _(t)   (5)

where R_(x) _(t) _(y) _(t) =H_(x) _(t) RH_(y) _(t) , μ_(x)=H_(x) _(t) μ and μ_(y)=H_(y) _(t) μ. Then the EM iteration that provides the updated covariance matrix estimate is given by

$\begin{matrix} {R = {{\frac{1}{n}{\sum\limits_{t = 1}^{n}\; {\left( {{\hat{Z}}_{t} - \mu} \right)\left( {{\hat{Z}}_{t} - \mu} \right)^{\prime}}}} + {{H_{x_{t}}\left( {R_{x_{t}} - {R_{x_{t}y_{t}}R_{y_{t}}^{- 1}R_{x_{t}y_{t}}^{\prime}}} \right)}H_{x_{t}}^{\prime}}}} & (6) \end{matrix}$

where {circumflex over (Z)}_(t)=H′_(y) _(t) y_(t)+H′_(x) _(t) {circumflex over (X)}_(t) and R_(x) _(t) =H_(x) _(t) RH′_(x) _(t) . Note that the EM approach can be applied to calculate the mean vector resulting in μ=Σ_(t){circumflex over (Z)}_(t)/n.

In another embodiment the sub-steps 202 and 203 are dispensed with. In this embodiment the initial values of the mean vector and covariance matrix from 201 are used in the prediction step.

Check for Convergence 203

Alternative embodiments in the form of different convergence criteria are possible. For example iterations can be ceased once changes in the covariance matrix and mean vector are sufficiently small. In another example, iterations can be ceased one the change in RMSE on a test set is sufficiently small. In another example, iterations can be ceased after an arbitrary number of iterations.

Calculation of Confidence

Alternative embodiments in the form an addition functionality involving calculating the confidence in the predictions. The covariance matrix of the missing ratings prediction is given by

R _(x) _(t) −R _(x) _(t) _(y) _(t) R _(y) _(t) ^(−l) R′ _(x) _(t) _(y) _(t)   (7)

Knowing this allows us to measure the confidence of the prediction. The predictions with lowest variance are those that we have the most confidence above. This could be used, for example, to provide a measure of the quality of predictions.

Adding New Movies and Users

Alternative embodiments in the form of functionality facilitating the addition of new products and new users are possible. Prediction of ratings for new users can be performed using the existing values of the mean and covariance. No re-estimation is required. A new product can be added to the system by increasing the dimension of the covariance matrix and mean vector. This can be done using the existing covariance and mean vector. The initial values of the extra elements can be set using a similar idea to that used in sub-step 201 and its alternative embodiments.

Incorporation of Technological Advances

Additional embodiments are possible that take advantage on changes in technology. As the size of available RAM 106 increases on machines then more data can be stored in RAM. If the RAM is sufficiently large it may negate the requirement for continual retrieval from the HDD 104. Furthermore, as Graphics Processing Units (GPUs) become increasing popular many of the operations can be performed on a GPU rather than a CPU.

Advantages

The embodiments of the method and apparatus disclosed here alleviate to a great extent disadvantages associated with the prior-art. The prior-art matrix factorization approach is poorly defined and it can be difficult to repeat results obtained by others. In contrast to the prior-art the methods presented here are well defined and repeatable.

The embodiments disclosed here also do no require the specification of many parameters. Thus the embodiments will perform well “out of the box” without requiring optimization over many unknown parameters.

In the prior-art the addition of new users and new products required extensive re-estimation. In the embodiments of the method and apparatus disclosed here, the prediction of ratings for new users requires no re-estimation whatsoever. To add new products requires increasing the dimension of the mean and covariance. This is a simple, procedure and is substantially easier than what is required in prior-art.

The performance of the embodiments disclosed here are better than prior-art. The best way to compare performance is to test approaches using the same data. An example of an appropriate data-set is the Netflix data which contains a plurality of movie ratings from a plurality of users. The first embodiment achieves an RMSE of 0.8907 on the Netflix data. It is reported that plain matrix factorization achieves a significantly worse RMSE of 0.902 on the same data-set.

CONCLUSIONS, RAMIFICATIONS AND SCOPE

From the foregoing description, it will thus be evident that the present application details a method and apparatus for product recommendation. As various changes can be made in the above embodiments and operating methods without departing from the spirit or scope of the following claims, it is intended that all matter contained in the above description or shown in the accompanying drawings should be interpreted as illustrative and not in a limiting sense.

Variations or modifications to the design and construction of, within the scope of the appended claims, may occur to those skilled in the art upon reviewing the disclosure herein. Such variations or modifications, if within the spirit of this invention, are intended to be encompassed within the scope of any claims to patent protection issuing upon this invention. 

What is claimed is:
 1. A method of predicting an unknown value comprising the steps of: retaining a set of observed values on a computer storage device; accessing the set of observed values from the computer storage device; calculating a mean vector, via a computer processor; estimating a single covariance matrix from the set of observed values; initializing the covariance matrix using the equation R=N^(−1/2)SN^(−1/2), where S denotes an un-normalized sample covariance matrix and elements of the diagonal matrix N denote the number of times each observed value was observed, predicting the unknown value using the mean vector and the covariance matrix when the unknown value is absent from the set; and providing the predicted value to an end user.
 2. The method of claim 1, wherein the mean vector is estimated using a stochastic gradient descent approach.
 3. The method of claim 1, wherein the covariance matrix is estimated using a stochastic gradient descent approach.
 4. The method of claim 1, wherein the prediction of an unknown value is by using a minimum means squared error predictor.
 5. The method of claim 1, wherein calculations are performed using basic linear algebra subroutines.
 6. The method of claim 1, wherein the covariance matrix is estimated using an expectation maximization algorithm.
 7. The method of claim 1, further comprising the step of continuously updating the estimation of the covariance matrix.
 8. The method of claim 7, further comprising the step of checking for convergence and if there is convergence halting the continuous updating of the estimation of the covariance matrix and saving the estimated mean vector and covariance matrix.
 9. The method of claim 1, further comprising the step of calculating a confidence rating from at least one portion of the covariance matrix.
 10. A system for predicting an unknown value comprising: at least one processor with a memory and in communication with at least one database, wherein the database includes a set of observed values; at least one application stored in the memory and capable of being executed by the processor to perform the following operations: accessing the set of observed values from the computer storage device; calculating a mean vector; estimating a single covariance matrix from the set of observed values; initializing the covariance matrix using the equation R=N^(−1/2)SN^(−1/2), where S denotes an un-normalized sample covariance matrix and elements of the diagonal matrix N denote the number of times each observed value was observed, predicting the unknown value using the mean vector and the covariance matrix when the unknown value is absent from the set; and providing the predicted value to an end user.
 11. The system of claim 10, wherein at least one portion of the covariance matrix is used to calculate a confidence rating.
 12. A method of predicting an unknown value comprising the steps of: retaining a set of observed values on a computer storage device; accessing the set from the computer storage device; estimating a covariance matrix from the set of observed values using the equation R=N^(−1/2)SN^(−1/2); predicting the unknown value using the covariance matrix when the unknown value is absent from the set; and providing the predicted value to an end user.
 13. The method of claim 12, further comprising the step of calculating a confidence rating from at least one portion of the covariance matrix.
 14. A system for predicting an unknown value comprising: at least one processor with a memory and in communication with at least one database, wherein the database includes a set of observed values; at least one application stored in the memory and capable of being executed by the processor to perform the following operations: accessing the set of observed values from the computer storage device; calculating a mean vector; estimating a covariance matrix using a sample covariance matrix, wherein the sample covariance matrix is pre-multiplied and post-multiplied by a diagonal matrix whose elements denote the inverse of the square root of the number of times each observed value was observed; predicting the unknown value using the mean vector and the covariance matrix when the unknown value is absent from the set; and providing the predicted value to an end user.
 15. A method of predicting an unknown value comprising the steps of: retaining a set of observed values on a computer storage device; accessing the set from the computer storage device; estimating a covariance matrix given by an sample covariance matrix, wherein the sample covariance matrix is pre-multiplied and post-multiplied by a diagonal matrix whose elements denote the inverse of the square root of the number of times each observed value was observed; predicting the unknown value using the covariance matrix when the unknown value is absent from the set; and providing the predicted value to an end user. 